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ABSTRACT 


Approximate maxlfflum likelihood decoding algorithms, based 
upon selecting a anall set of candidate code words with the aid of 
the estimated probability of error of each received symbol, can give 
performance olose to optimum with a reasonable amount of computation. 
By combining the best features of various algorithms and taking care 
to perform each step as efficiently as possible, a decoding scheme 
was developed which can u«code codes which have better performance 
than those In use today and yet not require an unreasonable amount of 
computation. The discussion of the details and tradeoffs of presently 
known efficient optimum and near optimum decoding algorithms leads, 
naturally, to the one which embodies the best features of all of them. 
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SECTION 1 


INTRODUCTION 


The quest for practical error correcting schemes with better 
performance at low signal-to-noise ratios has lead to the realization 
that one must use codes of high Q (distance x rate) and decode them with 
algorithms whose probability of error approach or equal the theoretical 
minimum for the code being used. For codes whose words are equally 
likely and which are transmitted over a memoryless channel, the optimum 
decoder is a correlator which calculates the distance between the received 
vector and all possible code words and selects as the best estimate 
of the transmitted code word that one which is closest to it. Because 
of the amount of computation required, it is unreasonable to decode 
directly in this way all but the smallest codes, whose error correcting 
power is relatively weak. Many schemes have been suggested which reduce 
the computational complexity vrtille maintaining the desired performance. 
These schemes can be roughly divided into two groups: those that use 

code structures for which highly efficient decoding algorithms, equivalent 
to optimum decoders, are known (for example, convolutional codes decoded 
using the Viterbi algorithm, see Reference 1*1); and those that use 
non>optimum decoding algorithms whose performance is close to optimum 
but whose complexity is much less than the optimum algorithm. 

One class of non-optimum decoding algorithms, which select 
a small number of candidate code words to be correlated with the received 
vector, can approach maximum likelihood performance even at low signal- 
to-noise ratios. The best of these algorithms take advantage of a 
preliminary sorting of the bits of the received vector according to 
absolute magnitude in order to arrange them in order of their estimated 
probability of being correct. These algorithms then select candidate 
code words such that, on the average, those bits which are most probably 
correct almost always keep the same value as the hard limited received 
vector, while those which are considered unreliable change sign more 
often. Each of the algorithms discussed here utilizes the constraints 
of the parity check equations in a different way in order to generate 
a set of code words while adhering to this general principle. 

One of the first algorithms of this kind was developed 
by D. Chase (Reference 1-2). He suggested perturbing the hard limited 
received code work X by adding it, modulo 2, to a test pattern 1 to 
obtain a new sequence X' . This new sequence is decoded algebraically 
to find the unique code word (if one exists) within a distance l(d-1)/2j 
(d s minimum distance between code words) of X'. If all the binary 
sequences of weight [d/2j and of length n (n s number of bits in code 
word) are used for test patterns, all code words within a distance 
d-1 of X will be in the set (X' = X ♦ 1). This two-fold increase over 
the error correcting capability of a conventional binary decoder makes 
up for the loss of non-optimum decoding and, at high signal-to-noise 
ratios, the performance asymptotically approaches that of a maximum 
likelihood decoder. This scheme is not practical, however, since, except 
for short codes of small minimum distance, the number of test sequences 
is very large. The information on the reliability of the received 
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syabols oan reduce this number to a practical value • for example, by 
diaoarding teat patterns with many ones in bit positions corresponding 
to reliable bits. In this and in a later paper (Reference 1-3), Chase 
describes methods of constructing sets with a reasonable number of 
patterns which not only approach maximum likelihood performance at 
high signal-to-noise ratios, but do so also at low signal-to-noise 
ratios. 


L. Baumert, R. HoEliece and G. Solomon (References 1-4 
and 5»2) have done much work using sets of erasure patterns, of trtiich 
only a small amount has been published. In this technique a set of 
bits, equal to the number of redundant bits in the code, is erased. 

A candidate code word is then generated by reconstructing these bits 
from the unerased ones. Each erasure pattern generates another candidate 
to be correlated with the received vector. 

The reasoning behind using erasure patterns rather than 
error patterns is that the redundancy of binary codes is much greater 
than its error correcting power. For instance, the (128,64) BCH code 
used as an example in this report has 64 redundant bits but can correct 
only 10 errors. On the other hand, to correctly decode a received 
word, this scheme must cover all hard decision errors. Using error 
masks, a nun^er or errors, up to the correcting power of the code, 
may remain exposed. The efficiency of both schemes is dependent upon 
the set of masks used and much thought has gone into finding methods 
of generating efficient sets. This will be discussed further in later 
sections describing each decoding scheme in detail. 

If only the single most probable erasure pattern is used, 
the probability of an error remaining in the unerased bits is considerably 
greater than the probability of decoding wrongly using a maximum likelihood 
decoder. B. Dorsch (Reference 1-5) noted that the unerased bits have 
a high probability of being correct and maximum likelihood performance 
could be approached by generating candidate code words by assuming 
error patterns of low weight in these bits. 

A decoding algorithm that tries all error patterns of weight 
w or less in the unerased bits must generate and check 



candidate code words. The large number of patterns required is someidiat 
offset by the ease in generating them. However, a slight modification 
results in a considerably more efficient algorithm. 

If one does not erase all of the redundant bits, then the 
ensuing redundancy oan be used to eliminate many error patterns in 
the remaining bits. Combining erasures and redundancy was implicit 
in Forney's error and erasure algorithm (Reference 1-6) which oan be 
thought of as shortening the code a bit at a time by successively erasing 
the most unreliable bits and algebraically decoding the shortened code. 
This scheme does not work well, however, at low signal-to-noise ratios. 
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One that does was developed by E. Berlekamp (Reference 1«7) for eorreoting 
a combination of burst errors and a small number of random errors and has 
been adapted here for random errors in linear codes. Vith a redundancy 
of r bits, on the average only 1 out of 2*' error patterns satisfies the 
r parity check equations. Increasing the number of unerased bits increases 
the total number of error patterns and, since the added bits are of lower 
reliability, the probability of errors of higher weight also increases. 

For small values of r, the first factor predominates and the computational 
efficiency of the algorithm increases significantly. 

If the maximum weight of the error in the unerased bits that 
need be considered is 2, then the generation of error patterns consistent 
with the parity cheek equations is considerably simplified. For the 
(128,64) BCH code many of the more probable error patterns of weight 
3 must be tried in order to approach maximum likelihood performance, 
especially when redundant bits are left unerased. On the other hand, 
Baumert's and McEliece's algorithm, which uses many erasure masks, can 
approach maximum likelihood performance without considering errors in 
the unerased bits at all. This suggests that these algorithms can be 
combined into a hybrid scheme which uses a small number of erasure masks, 
a few bits of redundancy, and consideration of error patterns of weight 
2 or less to achieve comparable performance. With a suitable selection 
of parameters such an algorithm is computationally more efficient than 
any one of its ancestors and is a possible competitor, in terms of com- 
plexity versus performance, to Vlterbl decoding of convolutional codes. 

An alternate approach to selecting a set of candidate code 
words is to disregard some of the channel information and make an optimum 
decision on what remains. This method works well at low signal-to-noise 
ratios where many bits have a high probability of error and can be dis- 
regarded with only a small penalty in performance. Such an algorithm 
can be put into an iterative form. Starting with an initial estimate of 
a set of Independent bits, the estimates can be Improved by considering 
bits related to the initial set through the parity check equations of 
the code. These redundant bits are considered one at a time, in order 
of increasing probability of error. Each bit Improves the estimate of 
the previous bits, and the ones with a higher probability of error have 
a smaller effect than those with a lower probability of error. As poorer 
and poorer bits are examined, they perturb the previous estimate less 
and less and a point is reached where the algorithm may be stopped with 
a high probability of being close to the optimum estimate. 

The code that was chosen to compare the various algorithms 
was the (128,64) BCH code of rate 1/2 and minimum distance 22. It was 
chosen because it is one of the shortest codes whose maximum likelihood 
performance is at least 1 dB better than that of the rate 1/2 constraint 
length 7 convolutional code. Figure 1-1, which compares the performance 
of a number of rate 1/2 codes, shows this clearly. These curves were 
obtained by Baumert and McEliece (Reference 1-4), who estimated the 
maximum likelihood performance by increasing the number of candidate 
code words generated by their algorithm until most of the wrong decodings 
were Identifiably maximum likelihood errors. That is, the code word 
chosen as the best estimate had a higher correlation with the received 
vector than the known transmitted code word. 
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Pig. 1-1. Estimated Maximum Likelihood Performance of Several 
Rate 1/2 Block Codes. 


The results on the estimated performance of various decoding 
scheoies were made using computer-generated bit error patterns. A random 
error vector was added to the vector ( 1 • 1 > . . . > 1 . 1 ) > corresponding to the 
all zero code word, and the components of the resulting vector sorted ac- 
cording to decreasing absolute magnitude. The location of the symbols 
with negative values indicated the pattern of errors that would be made 
by a bit by bit hard decision decoder. If the decoding algorithm could 
erase or correct all these errors, then a correct decoding was assumed. 
This gives a lower bound to the performance of the decoder since, in 
order to decode correctly, the error pattern must be erased or corrected. 
This does not guarantee, however, that a maximum likelihood error will not 
occur. This bound was found to be quite close to the actual performance 
of the decoding algorithm when operating more than 0.5 dB away from the 
maximum likelihood curves. 

Since the received vectors in most cases were not actually 
decoded, the probability of information symbol error also had to be esti- 
mated. Given the probability of code word error, a lower bound on the 
bit probability of error could be found by assuming all errors result in 
estimating code words located the minimum distance from the transmitted 
code word. For the (128,64) BCH code of minimum distance 22, 

Er error (ayabal) < 22_ . _1_ 

Pr error (word) “ 128 5.8 
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This ratio is asyaptotioslly oorrsot at high signsl-to-noiss ratios. 
At low signsl-to-noiss ratios in the rang# oonsidsred hors» s ratio 
of 1/5 yields s good estlMte of the syabol probability of error. 
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SECTION 2 


AN ITERATIVE ALGORITHM APPROXIMATING MINIMUM PROBABILITY 
OP SYMBOL ERROR DECODING 


2.1 INTRODUCTION 

Decoding algorithms that minimize the probability of symbol 
error are relatively rare because of the more complex form of such 
a decoder. One of the first attempts to reduce the amount of computation 
was that of L. Bahl, J. Cocke, P. Jelinick, and J. Raviv (Reference 2-1), 
who used a trellis to represent the possible states of the decoder. 

The amount of computation required, however, was n (n s block length 
or decoding constraint length) times that of a Viterbi decoder for the 
same code. The next. step was taken by Hartmann and Rudolph (Reference 2-2), 
who showed how to transform the decoding equations into the space of 
the dual code to a form that required much less storing of intermediate 
results. Their algorithm still required n times the computation of 
a minimum probability of code word error algorithm. 

The algorithm of Rudolph and Hartmann consists of generating 
a test statistic for each bit which is compared to a threshold. The 
statistic takes the form of the sum of a large number of terms, one 
for each code word in the dual code. By sorting the bits of the received 
vector according to probability of error, the order of the terms can 
be permuted so that the terms contributing significantly to the sum 
are considered first. The optimum estimate is then approached after 
only a small fraction of the terms have been summed (Reference 2-3). 

The ordering of the terms is a natural consequence of sorting 
the received bits according to increasing probability of error and 
decoding in the space of the dual code. It can be thought of as converting 
the algorithm to an iterative form. The k most reliable bits are independent 
and their value is initially estimated by hard limiting the received 
vector. The estimate can be improved by successively looking at bits 
related to them through the parity check equations of the code. These 
redundant bits are examined in order of increasing probability of error. 

Each bit, as it is examined, improves the estimate of the previous 
ones. However the redundant bits with a higher probability of error 
have a smaller effect than those with a lower probability of error. 

As poorer and poorer bits are examined, they perturb the previous estimate 
less and less and a point is reached where the algorithm may be stopped 
with a high probability of being close to the optimum estimate. 

The algorithm, in its present form, is not as efficient as 
the ones described in subsequent sections for a number of reasons. The 
computation primarily consists of the products and sums of real numbers 
rather than of binary vector additions, and common with the other minimum 
probability of symbol error algorithms, it requires a separate calculation 
for each symbol. 

i»R£CEDIt4G PAGE BLANK NOT FILMED 
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2.2 


A DESCRIPTION OF THE ALGORITHM 


As each symbol is received, the likelihood ratio, 



Pr(rn,|Cn * 0) 

C|u = 1) 


is calculated and mapped into the region (>1, -«-1) by the transformation 


Pm 


1 -<!>„ 

1 ‘ 


khen the entire code word has been received, the symbols are sorted 
according to increasing probability of error (or equivalently, magnitude 
of Pa) so that the least reliable bits are to the right. The columns 
of the parity check matrix of the code are then permuted to the same 
order as the symbols. By using row operations only, the permuted parity 
check matrix can be reduced to a form which has a triangle of zeros 
in the upper right hand corner. The first p rows of this matrix represent 
the dependence between the k ♦ p symbols with the least probability 
of error. If the remaining n - (k p) symbols are considered erased, 
then an "optimum” decision, in the sense of minimum probability of 
symbol error, can be made using only this portion of the matrix. This 
form of the matrix also leads to an Iterative algorithm. Starting 
with p s 0 and increasing p by one each iteration, successively poorer 
received bits are considered in estimating the transmitted symbols, 
until for p = n - k the "true” optimum estimate is reached. 

The decoding rule, for minimum probability of symbol error, 
using the method of decoding in the dual space of the code, is from 
equation (13) in Hartmann and Rudolph (Reference 2-2). 


where 



< 

Ho 


0 


6mi s 1 if m = # 
6mi s 0 if m ^ f 


C'ji is the fth bit of the Jth code in the dual code, whose code words 
are formed by a linear combination of the rows of the parity check matrix. 
When estimating the mth bit, the term p| is included in the product if 
the Ith bit of C'j s t and m ^ I, or the fth bit of C'j s 0 and m s f. 


2-2 



As an initial estimate of the transmitted symbols (p = 0), 
let ^ s 1 if Pb > 0 and s 0 if pm ^ 0. The first iteration uses 
only a single parity check equation so that there are only two words 
in the dual code— the all zero code word and the one equal to the first 
row of the parity check matrix. The all zero vector contributes to 
Agi the term Pg|, which is the initial estimate, and the single parity 
check equation contributes a single product term of the p'fs. 

The algorithm is then iterated, each time adding another 
parity check equation and taking into consideration the "best” of the 
remaining received bits. At each iteration the number of terms in 
the sum is doubled. Because of the reduced form of the parity check 
matrix, with zeros in the upper right hand corner, the terms Ag, from 
previous iterations are not changed when a new row of the matrix is 
added and the new terms can be added directly to the previous stage's 
estimate. At each iteration the estimate of each bit is Improved and 
the bit probability of error decreases. 


2.3 AN EXAMPLE USING THE (23,11) GOLAY CODE 

A detailed example of this algorithm will be given for 
the (23,11) Golay code. The block length of this code is long enough 
to see the convergence of the algorithm to the optimum solution as 
the number of iterations increase and yet short enough that a complete 
decoding can be done in a reasonable time. In order to estimate the 
performance of the algorithm at a given slgnal-to-noise ratio, a large 
number of received vectors were generated and fully decoded. Only 
the 12 most reliable bits are estimated by the algorithm. The remainder 
are calculated through the parity check matrix. This forces the estimate 
of the transmitted vector to be a code word. The estimate of these 
bits was stored after each iteration and the code word was considered 
correctly decoded when the 12 bits indicated as most reliable were 
estimated correctly. 

Since the code is linear, the code space looks the same when 
viewed from any code word. Therefore, it can be assumed, without loss of 
generality, that the all zero code word is transmitted. The received word 
can be represented by a vector of dimension 23, y = (yi, ^ 2 * ’ • ' '^^ 23 ^ ' 
element of the vector is of the form yj^ s 1 + nj^ where nj^ is a sample of 
a zero mean Gaussian process of variance 


(T^ - 
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As a numerical example, consider a received code word which 
contains four hard errors and therefore cannot be decoded correctly 
using algebraic decoding. In addition, one of the errors is among 
the 12 best bits so that the initial estimate of this algorithm would 
also be in error. A received vector (SNR = 1.0 d) and the corresponding 
likelihood ratios are tabulated in Table 2-1. 
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Table 2-1. Amplitude, Likelihood Ratio, and Transformed 
Likelihood Ratio for Bit Locations 


0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

*ffl 

.99 

1.25 

.44 

-.63 

.52 

.25 

.01 

.97 

.36 

-.90 

1.13 

.41 

Am 

.074 

.041 

.26 

3.04 

.22 

.41 

.70 

.808 

.32 

5.62 

.054 

.28 

Pm 

.86 

.92 

.39 

-.50 

.64 

.42 

.17 

.86 

.32 

-.70 

.90 

.56 


13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 


Am 

.11 

.06 

1.12 

.33 

-.74 

-.24 

.94 

1.86 

.71 

2.87 

1.73 


Am 

.36 

.62 

.054 

.33 

3.90 

1.24 

• 

o 

00 

U) 

.010 

.14 

.001 

.013 


Pm 

.28 

.24 

.90 

.50 

-.59 

-.11 

.85 

.98 

.75 

.99 

.97 



m = Bit number. Aj, = Likelihood ratio of mth bit. 

Ag, = Amplitude of mth bit. Pm ~ Transformed likelihood ratio 

of mth bit. 


The first step of the algorithm is to sort the bits according 
to the absolute values of the transformed likelihood ratios p^,. The 
sorted order, shown in Table 2-2, is; 


Table 2-2. New Bit Labeling Ordered by Transformed Likelihood Ratio 


Sorted Bit Order 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

Original Bit Order 

22 

20 

23 

2 

11 

15 

8 

1 

19 

21 

10 

5 

Sorted p„j 

.99 

.98 

.97 

.92 

.90 

.90 . 

86 . 

.86 . 

,85 , 

.75 - 

.10 . 

64 

Sorted Bit Order 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 


Original Bit Order 

17 

3 

12 

9 

16 

4 

6 

13 

14 

7 

18 


Sorted Pj, 

.59 . 

59 

.56 

.52 

.50 

-.50 

.42 

.28 

.24 

.17 

-.11 
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The original parity check matrix for this code is 


101001001 1 1 110000000000 
0101001001 1 1 1 1000000000 
00101001001111100000000 
0001010010011 1 1 10000000 
00001010010011 1 1 1000000 
000001010010011 1 1 100000 
0000001010010011 1 1 10000 
00000001010010011 1 1 1000 
000000001010010011 1 1 100 
000000000101001001 11110 
00000000001010010011 1 1 1 


The columns are permuted to the same order as the received 
symbols to yield 


00001010001001 1 1001 1000 
00011000001000101001110 
00001101000101100001100 
00000100000000 111111100 
00000100001110000101110 
00011010000100001101010 
00000100100010110100011 
01000001101010000101001 
01001000110010010000101 
11000100111000100000001 
1110100011 0000000 101000 


Reducing this matrix by row operations to form a triangle of 
zeros in the upper right hand corner yields 


00111010110110000000000 
11110000011101000000000 
01101101100010100000000 
11001001010001110000000 
1000010110010011 1000000 
11001101001100000100000 
010001011100000101 10000 
10000101010010100101000 
100011 0000 101011 0000 1 00 
1 1 0000000 11010010100010 
110001001110001 0000000 1 
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The estimator 


A 


(i) 

m 


?n„ 




vrtiere i s iteration number will br. used with this matrix and the sorted 
Pq's. For the ith iteration the cooe words C*j are formed by all linear 
combinations of the first i rows of the matrix. 

The initial estimate can be thought of as using the estimator 
with a dual code consisting of the all zero vector alone. The index 
J, which indicates the Jth code word in the dual code, has only the 
value 1, and C'jf is zero for all values of f . In this case, 

A.*”’ .flp/*' . 


Since s 0 for a kt and 1 for m s I , Ag, Pg,. When the only 
code word in the dual code is the all zero one, the code itself contains 
all possible binary n»tuples. All the bits are independent and the 
best estimate of any bit depends only upon the likelihood ratio of 
that bit itself. 

The first iteration begins to use the dependence between 
the bits as expressed by the first parity check equation (the first 
row) of the H matrix 


hi = (00111010110110000000000) 

There are only two code words in the dual code; the all zero code word 
and 

(1) P3 Pjj P5 P? P9 P10 P12 P13 

hi so that Ag,' '= Pg, + for m = 3, ‘I, 5, 7, 9, 

Pm 10, 12, 13 


= Pm ♦ Pm P3 P4 P5 P6 P7 P9 P10 P12 P13 otherwise. 


Note that each term corresponds to a code word in the dual code. Each 
term contains the product of the transformed likelihood ratios of all the 
bits in the code word which are equal to one multiplied or divided by the 
ratio of the mth bit. 

The second iteration uses two parity check equations hi and h 2 . 


h£ s (111100000111010000000000) 
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The four code words In the duel code are 


s 0 h2 0 h^ s 00000 00000 00000 00000 000 
C2 s 0 h2 4- 1 h^ s 00111 01011 01100 00000 000 
C3 s 1 h2 4- 0 h^ s 11110 00001 11010 00000 000 
Cj) 8 1 h2 4> 1 h^ s 11001 01010 10110 00000 000 


At this step the purpose of permuting the columns of the 
H matrix and reducing it to one with all zeros in a triangle in the 
upper right hand corner becomes clear. First, the code words in the 
dual code at the fth iteration contain all the code words of the 
(i-lst) iteration so that 


A 


(I) 

m 


*Ai 


«-i) ^ 


new products. These new products are formed from the coce words generated 
by adding modulo 2 the new parity check equation h| to all the code 
words of the previous iteration. Second, all the new products include 
the transfcrmed likelihood ratio of the (K * f )th bit but not of bits 
less reliable than this. Thus the th iteration uses the previous 
estimate and the parity check equation containing the best bit not 
yet used to obtain an Improved estimate. 

At each iteration the estimate as to which bits are best 
may change. This is seen in Figure 2-1 where theA|g^^)'s are plotted 
as a function of iteration number. As more parity check equations 
are used, the absolute value of A (which is a measure of goodness) 
of the bits vrtiose initial estimate was wrong, decreases relatively 
rapidly. At the seventh iteration the best 13 bits are correct and 
the code word would be decoded correctly if the algorithm would be 
stopped at this point. At the final iteration, using the full decoding 
algorithm (equivalent to maximum likelihood decoding), the best 17 
bits are correct. Of the four bits whose initial estimate was wrong, 
only the one with the poorest initial estimate was corrected. In cases 
where there are fewer errors or the likelihood ratios of the correct 
bits are initially higher, the wrong bits are also corrected, but this 
is not necessary for correct decoding using this algorithm. 

The performance of this code as a function of number of 
iterations is shown in Figure 2-2. The zero iteration curve illustrates 
the performance possible by assuming there are no errors in the best 12 
bits, while the 11th Iteration curve represents the performance using 
the full decoding algorithm. Note that in the sixth iteration, using 
2° 8 64 terms in the sum of the estimator, the performance is almost 
equivalent to the full decoding algorithm which requires 2^^ = 2048 
terms in the sum. 
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Figure 2-1 . Inprovement of Bit Estimates as a Function of 
Iteration Number 


2-8 





SECTION 3 


ERROR PATTERNS AND ALGEBRAIC DECODING 


Algebraic decoding can be used to correct errors within a 
distance dg,/2 (where dg, = the minlmuo distance of the code) of any 
error pattern by adding it modulo 2 to the hard limited received vector 
and then decoding. Proper selection of a set of such patterns, which 
will be called masks, results in searching a region of space where the 
errors are most likely for a set of candidate code words. These code 
words are then correlated with the received vector and the one with the 
highest correlation is selected as the best estimate of the transmitted 
code word. 


IP 

D. Chase (Reference 1-3) suggested using the 2 words of 
the (23,12) Golay code as masks on the 23 least reliable bits of the 
(128,6i<) BCH code. Since any vector of dimension 23 is within a distance 
3 of at least one of the Golay code words, any error pattern in the 23 
bits will be reduced to 3 or less errors when added to some mask. The 
larger code can correct up to 10 errors overall. Therefore, depending 
upon how many errors in the least reliable 23. bits are left uncorrected, 
patterns of 7 to 10 errors in the most reliable 105 bits can be corrected. 

The performance of this decoding algorithm is within 1/2 dB 
of maximum likelihood (Figure 3-1) and requires 4096 algebraic decodings 
and correlations. This is considerably more computation than required 
to obtain comparable performance using the algorithms described in 
later chapters and is due mainly to three factors. First, this algorithm 
is based upon detecting errors in contrast to erasing them. For the 
(128,64) BCH code, the redundancy is 64 bits as compared to an error 
correcting power of 10 bits and erasing errors is more effective than 
correcting them. Second, the division into two groups of 23 and 105 was 
not based upon any property of the large code but only upon the perfection 
of the Golay code. It may possibly be more efficient to divide the 
(128,64) BCH code into two more evenly balanced groups. Fewer errors 
would exist, on the average, in the more reliable group so that, while 
the less reliable group is larger, one needn't have to correct so many 
errors in it. Third, the symbols are divided arbitrarily into two 
groups, one of relatively high and one of relatively low probability 
of error. Within these groups all bits are considered equal. However, 
it is more efficient to take into consideration, even approximately, 
that the bits have a continuous distribution of probability of error. 


rfttCEDING PAGE BLANK NOT FILMtw 


3-1 




^^0 


Figure 3-1* Relative Perforaanoe of the Chase Decoding Algorithm 
for the (128,64) BCH Code 
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SECTION 4 


DECODING WITH ERASURES AND ERROR PATTERNS 


A.1 INTRODUCTION 

For oost codes the redundancy is greater than the minimum 
distance between code words so that it is possible to correct many 
more erasures than errors. The simplest erasure correcting scheme 
is one that erases the n-k bits vrtiose estimated probability of error 
are highest and reconstructs these bits from the k remaining ones by 
means of the parity check matrix. This scheme will yield the correct 
estimate only when all the k non-erased bits are estimated correctly by 
a bard decision decoder. At low signal>to*noise ratios the performance 
of such a scheme is quite a bit poorer than maximum likelihood but 
can be improved by expanding the scheow to produce a set of possible 
code words. 


In the k remaining bits which are not erased, there are 2^ 
possible error patterns. If the decoder correlates with the received 
vector all of the 2^ possible code words generated by adding these error 
patterns to the unerased bits, maximum likelihood performance will be 
achieved. (Note that there is not s one-to-one correspondence between 
error patterns and code words. If the k-bits are not all independent, 
there will be no solution when the dependent bits are not consistent 
with the values of the independent ones and more when they are.) Between 
the extreme of trying no error patterns and trying them all, one can 
choose a reasonably sized subset. 

To achieve a level of performance of less than 0.5 dB from 
maximum likelihood approximately 10,000 oandidate code words are required. 
In spite of the large number, this decoding scheme is competitive with 
the others because of the small number of operations needed to generate 
each candidate. In this section it will be shown how one should select 
an efficient set of error patterns and arrange the calculation such 
that the number of operations will be the minimum possible. 


4.2 GENERATING ERASURE PATTERNS ORDERED ACCORDING TO PROBABILITY 

Assuming that only a fixed number of error patterns can be 
tried, it is worthwhile to use the set which contains the ones which 
are most probable. If the channel is memory less then the probability 
of a given error pattern in a block of m bits is 


Pr(A) = 



<*1 



®i 
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Uhere 


s 1 if the 1th bit is in 
the error pattern; 
s 0 otherwise 

s probability of 1th bit being In error 
= probability of 1th bit being correct. 

The first product Is Independent of the error pattern chosen. Repre- 
senting it by the constant 


Pr(fi.) = 


Cl 



1 


taking logarithms 


tn Pr(a) = fn 


m 


E 

1=1 


CjL ^n 



For a Gaussian channel the log likelihood ratio Is proportional to 
the bit amplitude aj^. 

m 

#n Pr(^) = <n Cl - C2 ®i®i* 

1=1 


The logarithm Is a monotonlc function of Its argument so that ordering 
the error patterns by decreasing probability of occurrence is equivalent 
to ordering them by Increasing d^ where 


® t 

di = E eiai = A a. 
1=1 


It is not necessary to calculate the most probable error 
patterns for each received vector; a set of fixed-error patterns, 
Independent of the received slgnal-to-noise ratio, can be used with 
negligible loss of performance. This simplification has been Justified 
by simulation, but It can be demonstrated simply by observing the mean 
amplitude of the received bits after sorting. An example for a block 
length of 128 is shown In Figure 4-1. The curves are linear over most 
of their length, except for a few of the worst and best bits, and change 
slowly with slgnal-to-noise ratio. As the slgnal-to-noise ratio Increases, 
the variation In bit amplitude decreases and approaches one for all 
bit positions. Thus, the algorithms which depend upon sorting the 
received bits according to their amplitudes become less efficient. 




Figure 4-1. Average Bit Amplitude of Received Bits After Ordering 
Block Length = 128 

However, at these signal-to-noise ratios, practically all received 
vectors are within half the minimum distance between code words so 
that maximum radius decoding algorithms can be used. For signal-to- 
noise ratios of 1.0 - 5*0 dB the algorithms discussed here are superior. 
In this region a good approximation to the mean bit amplitude is 


128 - b< 

»i = 64 

where b^ = sorted bit position. The denominator is Just a scale factor 
and may be dropped without affecting the ordering. 

Defining an error pattern index f as 

n 

f = 128n - 53 bi 
i=1 

where n s number of errors in error pattern. It is easy to generate 
error patterns accordirg to ascending index. Figure 4-2 illustrates 
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the number of error patterns as a function of the error pattern Index. 
All single errors have a lower Index than any double error, but there 
are triple errors with lower index than some double errors. There 
are 2080 single and double error patterns. Checking them all, in order 
of asceiidlng index, reduces the probability of error, as a function 
of number of patterns, more slowly than using the set of patterns that 
allow triple errors. The overall performance for this error and erasure 
decoding algorithm, as a function of the number of error patterns, 
is shown in Figure 4-3. To achieve a given level of performance, e.g. 
within 1/2 dB of a maximum likelihood decoder, the number of error 
patterns (or candidate code words) required decreases slowly as the 
signal-to-noise ratio increases. 


4.3 GENERATING CANDIDATE CODE WORDS FROM THE ERROR PATTERNS 

The generation of candidate code words from a particular error 
pattern is done in two stages. First the error pattern is checked for 
consistency with the parity check equations. Then, if it is consistent, 
a set of code words is generated. Consistency requires that CPia^-£ ~ 

In addition to a compare and branch, the number of vector additions 
needed to check this is equal to the weight of £, which is one, two 
or three in the practical cases discussed here. If the set of error 
patterns is determined beforehand, then they can be arranged in such 
a way that only one vector addition is needed. In achieving this savings, 
however, the error patterns are no longer checked in order of ascending 
error pattern index. 

When the error pattern is consistent with the parity check 
equations, then candidate code words exist. The bits of the candidate 
code words can be divided into 3 sets: 

( 1 ) , containing the bits which were checked for consistency 


£l = ♦ £ 


( 2 ) £ 2 a, a number of arbitrary bits 

( 3 ) £ 2 b* containing the bits calculated from ( 1 ) and ( 2 ) 

£2b = fi2 ♦ tPlb^A [P3U2a* 

The first two sets are calculated once, requiring a maximum of three 
additions (when the weight of £ = 3). This generates the first cr '•didate 
code word corresponding to £ 3 ^ s The remaining bit patterns of 
£ 2 h can then be generated in Grey code sequence so that the succeeding 
code words can be calculated by using only a single vector addition 
per code word. 




Figure 4-2. Cumulative Number of Error Patterns for a Given Index 
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SECTION 5 


DECODING USING SETS OF ERASURE PATTERNS 


5.1 INTRODUCTION 

A more efficient method of generating candidate code words, 
in the sense of requiring fewer of them to obtain a given level of 
performance, is to use sets of erasure patterns. For each erasure 
pattern in the set, the corresponding bits in the received word are 
erased and then reconstructed by means of the parity check equations. 

A correct decoding requires that for at least one of the patterns, 
all of the unerased bits be error free. 

One approach to constructing sets of erasure patterns is 
to try to cover as many error patterns as possible with as few masks 
as possible. A combinatoric solution to this problem is by means of 
t-designs (Reference 5-1). Given a set of v elements, a t-design is 
a collection of subsets of k elements with the property that any subset 
of t elements is contained in exactly \ blocks. The design is represented 
as t - (v, k, \) and is sometimes called a tactical configuration. 

When used to construct erasure masks, v corresponds to the block size 
and k to the number of erased bits. All errors of weight t or less 
are covered A times. The difficulty of this approach is that designs 
are known only for small t and are therefore useful only for codes 
of short block length. By relaxing the requirement that all error 
patterns be covered exactly t times, L. Baumert, R. McEliece and G. 

Solomon have devised a method of generating masks from sets of code 
words (Reference 5-2). However, both these methods have the disadvantage 
that all patterns are treated equally, A more efficient set of masks 
would consider the probability of a given error pattern occurring rather 
than Just its weight. This criterion leads to a statistical approach 
for the design of sets of masks. 

The first set of such masks was generated by L. Baumert and 
R. McEliece using a weighing based upon the entropy of the error proba- 
bility of each bit (Reference 1-4). (The entropy could not be used di- 
rectly because it violated the constraints on allowable weighing functions, 
which will be seen in the next section.) This approach will be used here 
to generate a good set of erasure masks. Starting with p . linear weighing, 
sets of masks with slightly perturbed weighings are generated. The best 
ones are selected and the process is repeated until it converges to a 
good weighing function. This approach is possible since the performance 
varies only slowly with changes in weighing. The set of masks arrived 
at in this manner was almost identical to that •;f Baumert and McEliece, 
which confirmed the accuracy of their intuition. 

With 1000 masks, maximum likelihood performance is approached 
to within a few tenths of a dB. However, the generation of candidate code 
words from these masks and the received code word requires a relatively 
long calculation. In the latter part of this section it is shown that the 
masks can be ordered within a given set so that the number of calculations 
per candidate code word is considerably reduced. 
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5.2 


GENERATING AN EFFICIENT SET OF ERASURE MASKS 


The statistical method generates a set of masks which covers 
each bit, on the average, a given percent of the time but does not 
attempt to cover a specific set of errors. The advantage of this method 
is that it is relatively easy to generate a set with a given number of 
masks and a wide range of distributions and which, almost always, cover 
the errors in an efficient manner. The exact distribution is not criti- 
cal; however, the bits with a probability of error close to zero should 
hardly ever be covered and those with a probability of error close to 
one-half should almost always be covered. Also, since the fraction of 
bits left uncovered by each mask must equal the rate of the code, the 
entire distribution must also satisfy that constraint. These constraints 
may be written as: 

(1) = m ; la = 0 

(2) li 2 Ij if i > j 
n 

(3) ]C *i ■ “*■'*•*’ 
i=l 

where 

I = the number of times the ith bit is covered 
m = number of masks 
n s number of bits in code word 
r = rate of code 

There are an extremely large number of functions that satis- 
fy these constraints. However, it is not difficult to find good ones 
by trial and error since their performance is relatively insensitive to 
the exact shape of the function. For example, consider a code of rate 
1/2. From the third constraint the area under the function must equal 
(m*n)/2, one half the area enclosed by the graph. One function that 
satisfies these constraints is linear in percent of bits covered versus 
bit number. Figure 5.1, curve a. Others, such as b and c, which empha- 
size the covering of bits with a high or low probability of error, and 
d, which has a large discontinuity, are also possible. By simulation it 
has been found that functions with the general shape of b generate masks 
with best performance. This will be discussed in greater detail in the 
following section. 

If the fractional area under the desired function equals the 
rate of the code, then one need only find the appropriate scale factor 
and compensate for the small errors introduced by rounding off the 
function to Integral values. For example, the linear function 

li = m(l - i) 
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satisfies 





m*n 

2 


so that It is suitable for a rate 1/2 code. 


Most of the values of are non-integral and truncating or 
rounding of these numbers may produce a sum not equal to (m>n)/2. 

The slight adjustment required may be made by letting equal the 
integral part of 



and varying c until 



For a general function, however, the fractional area under 
it does not equal the rate and some distortion must be introduced in 
order to meet this constraint. One possibility, used by L. Baumert for 
functions which enclose a fractional area less than the rate, is to 
multiply the function by a scale factor which will generate values of 
(for some i) greater than m and then limiting these values to m. The 
scale factor (and area) is Increased until the third constraint is met. 
This procedure usually results in functions similar to curve b of 
Figure 5-1 which have been found to yield efficient masks. 

Given the normalized distribution which satisfies this 
constraint, the cover is generated by randomly selecting a mask and 
placing a one in the 1th bit position until ones have been placed. 

As 1 approaches n, certain moves will be forced in order to place the 
proper number of ones and zeros in each mask. These are done first, 
before the remaining ones are placed randomly. The algorithm will fail 
if the forced moves require more than ones in the ith position. 
However, this occurs very rarely in practice and can be remedied by some 
slight adjustment of the bits in previous positions. In the generation 
of many sets of masks, these slight deviations from a purely random 
placement of ones and zeros has not been found to produce any adverse 
effects . 


The linear distribution function is a good starting point 
for investigating masks suitable for codes of rate 1/2. Using the 
(128,64) BCH code as an example, a set of 1000 masks were generated and 
tested against 10,000 received vectors at various signal-to-noise ratios. 
The received vectors were sorted, hard limited and the bit positions 
containing errors were noted. The masks were then successively placed 
over the error pattern and the number of masks tested before the error 
pattern was completely covered and recorded. No attempt was made to 
order the masks in any particular way. 
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Figure 5-1. Weighing Functions for the (128,64) BCH Code 


The resulting curves, seen in Figure 5-2, are an approxima- 
tion to a lower bound on the probability of error. If the error vector 
is not covered then the decoder, using these masks, will surely be in 
error. If the error is covered, the transmitted code word will be 
aunong the candidate code words, but there still may exist another code 
word that has a larger correlation with the received vector. This bound 
is generally a good approximation except when the decoding algorithm is 
operating very close to maximum likelihood performance. Using 1000 
masks, the performance of the decoding algorithm is approximately 0.15 
dB worse then the maximum likelihood. 

The linear distribution was selected as one satisfying a 
number of simple heuristic considerations. That these considerations 
are valid is demonstrated by the fact that performance equal to that 
obtained with erasure and error patterns requires only one-tenth the 
number of candidate code words. It may be, however, that other distribu- 
tions can generate a set of masks that perform even better. To check 
this, two distributions symmetric about the linear one, b and c of 
Figure 5-1, were tried in order to get an idea in which direction to 
proceed. Their performance, shown in Figure 5-3, and that of others not 
shown, indicated that better performance can be obtained with weighings 
which cover the bits with high probability of error more often than the 
linear weighing. 

A family of distributions with this property was then inves- 
tigated. Rather than using a set of symmetric functions, a set which 
always covered a given number of bits with the highest probability of 
error was selected (see Figure 5-4). The performance is only slightly 
different from that of a similar synmietrlc distribution and includes, 
as distribution g, the one originally used by L. Baumert. Using 1000 
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Figure 5-2. Performance of a Set of Masks with Linear Weighing 


masks and 10,000 received vectors at an SNR = 2.0, the probability of 
an error pattern not being covered is 

Distribution a e f g h 1 

Probability error 

pattern not covered .020 .015 .011 .011 .021 .030 

Distribution g is among the best and was Investigated further. 
Its performance at various slgnal-to-noise ratios is given in Figure 5-5. 
Note that the performance curves are very similar to that of the linear 
distribution with a small displacement. This is another Indication that 
the performance changes slowly with changes in distribution. The 
performance is only slightly changed when different sets of masks, 
generated from the same distribution, are used. Figure 5-6 Illustrates 
this point using two sets generated from distribution g. 


5.3 ORDERING THE MASKS FOR EFFICIENT DECODING 

When a code word is received, the bits are sorted in order 
of their absolute magnitude and the columns of the parity check matrix 
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Figure 5-3. Relative Perfornance of Sets of Masks Using the 
Weighing Function of Figure 5-1. (SNR “2.0 dB) 
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PUCENT OF TIME BIT IS COVEtED 



Figure 5-M. Weighing Functions for Rate 1/2 Codes 
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Figure 5-5. Performance Using the Weighing Function of 
L. Baumert (g) 
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are permuted accordingly. Each mask erases n - k bits and these must be 
determined from the unerased ones in order to generate a candidate code 
word. This operation requires solving a set of simultaneous equations, 
or in terms of matrix operations, the reduotion of the permuted parity 
cheek matrix to standard form. Since each mask erases a different set of 
bits, the reduotion must be done anew for each candidate code word. 

This calculation is by far the most time consuming and its efficiency 
determines the overall efficiency of the entire algorithm. It is there- 
fore worthwhile to reduce the computation of the reduction of the 
parity check matrix to a minimum. 


As before, the rate 1/2, (128,64) BCH code will be used as 
an example. Given the parity check matrix H, for any code word £, 

[H]£ s Jl. If H is in reduced echelon form then it can be partitioned 
into two parts, one of them an Identity matrix. Partitioning £ corre- 
sponding to the partition in H 


t PII ) 





or 

In this form the parity check bits £ can be determined directly from 
the information bits £j^. ^ 

Each mask erases 64 of the 128 bits which must be recon- 
structed from the remaining 64. This can be done by permuting the 
erased bits to one side of the matrix, the unerased to the other, and 
reducing the resulting matrix. (The original matrix P is non-singular. 
However, it may be that for a particular permutation it is not. This 
case can be handled in a manner similar to the one described in the Appendix.) 


. For a general matrix H this would require on the order of 

(64) V2 vector operations. However, in this particular case, one can get 
by with much less. 

How much less depends ufton the Hamming distance between two 
successive masks. Since the weight of each mask is identical, transform- 
ing one mask into another can be thought of as Interchanging pairs of 
bits from the set of erased bits to the set of unerased bits. Two masks 
with a Hamming distance d{j between them require d|{/2 interchanges. 
Transforming the parity check matrix corresponding to one mask to a 
matrix corresponding to the next requires the interchange of d|j/2 pairs 
of columns and the reduction to standard form. The 64 - (df))/2 columns 
in the erased set which have not been interchanged contain only a single 
one and are already in reduced form. The number of row operations 2 
required to reduce the remaining colusns are on the order of 1/2 (d|j/2)^ 
rather than 1/2(64)^. 

The average distance between masks in a set depends only 
upon the distribution that was used to generate the masks and not upon 
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the pertiouler set used. Given a eet of n maske with a distribution 
(i K number of times the ith bit is covered )» the sum of the distances 
between the ith bit of a particular mask (m ■ mj) and the ith bit cf all 
other masks is 


^d^(mj,m|(} « n • if the bit is a 1 

A 

« 1 ^ if the bit is a 0. 

Since there are ones and n - zeros, the sum over all masks is 
^*^i^®i»®k^ * #i(n-lj) ♦ * 2lj^(n-/i) 

J k 

®J^“k 

Dividing by the number of terms in the summation, the average distance 
between the ith bits is 


2ij^(n-/^) 

n(n-1) 


Summing over i gives the average distance between masks 


d 



S 


i n(n-1) 


For the ij^'s of distribution D, d s 28. If all the words 
were equidistant from each other (desirable from the point of view of 
covering as many error patterns as possible) then the number of vector 
additions required to reduce the parity check matrix for each mask would 
be on the order of 


1 

i 



2 


98. 


The masks are probabilistically generated and the actual 
distances are distributed about the mean, as shown in Figure 5-7. Given 
an arbitrary ordering of masks, the number of vector additions required 
to reduce the parity check matrix m times would be even greater than 



However, if the masks are sorted to an order in which the distances 
between successive masks are as small as possible, the number of computa- 
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tions will be reduced significantly. A simple sorting algorithm is 
as follows: 


J k 
“j^®k 


Starting with an arbitrary mask, the mask closest to it 
is placed second. The sort is continued by searching the remaining 
masks and again placing at the end of the chain the one closest to 
the mask currently last. The algorithm is continued until all the 
masks have been ordered. 

The resulting distribution of distances betwc<«n successive 
masks is shown in Figure 5-8. With a little bit of work, a chain could 
undoubtedly be constructed which would have successive distances between 
masks of 20 or less and require only 


1 

2 



50 


vector additions to convert one reduced matrix to another. With this 
simplification this decoding method becomes competitive with other 
schemes discussed in this report. 


SECTION 6 


DECODING USING BOTH SETS OF ERASURE PATTERNS 
AND ERROR PATTERNS OF LOW WEIGHT 


The most time consuming operation In the decoding scheme using 
sets of erasure masks Is the reduction of the parity check matrix each 
time a new mask Is used. The number of masks required Is determined 
by the desired probability that an error Is completely covered by at 
least one mask. If the requirements are reduced to allow a maximum of 
one or two errors to remain exposed, the number of masks required Is 
reduced by a large factor. Locating these errors requires a certain 
amount of computation, but In most cases the total computation required 
to achieve a given level of performance will be less than In the original 
algorithm. 


In order to estimate the advantages of such a schems, consider 
the set of 1000 masks with a performance represented by curve g in 
Figure 5-4. The curve shows the number of masks required to cover, 
at least once, all the errors In a received code word a given fraction 
of the time. This curve Is repeated In Figure 6-1 and Is accompanied 
by curves showing the fraction of time a maximum of one, two or three 
errors are exposed. 

To achieve a level of performance equal to 1000 masks and 
no errors exposed requires 50 masks If one, 6 masks If two, and 2 masks 
If three errors remain exposed. This scheme will therefore be more 
efficient If locating single errors requires less computation than 
reducing the parity check matrix 20 times, double errors 166 times, and 
triple errors 500 times. 

Single errors may be located by assuming one of the exposed 
bits to be In error and calculating the erased bits, given this assumption, 
for each of the exposed bits. Assuming the reduced parity check matrix 
Is In the form [H] s [Pil], the calculation of the candidate code 
words Is straightforward. (This Implies that the erased bits are all 
Independent. The case for which this Is not true Is discussed In the 
Appendix.) The parity check equations can be written as 


[H] A = ii 


idiere 


or 


- exposed bits. 


tP 
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Figure 6-1. Probability of errors not being covered when using 
masks of 64 bits out of 128 


vrtiere 


^2 = erased bits assuming no errors in 
idiich is equivalent to 


&2 = tPlai- 


Assuming an error pattern represented by ones in the vector the 
erased bits are equal to 

A3 = [P]Ai + [P]a s A2 + [P]A- 

The calculation of A 2 requires a number of vector additions 
equal to the number of ones in (This calculation is also required 

when decoding using erasure masks only. ) Assuming an error pattern 
[P]a equals the sum of the corresponding columns of the matrix [P]. 

Thus, if there are k independent bits in the code word, to calculate 
the candidate code words corresponding to no and single errors requires, 
at most, 2k vector additions. 
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Considering the previous example of a (128,64) BCH code 
with 1000 masks and an average distance of 16 between masks, calculating 
the code word corresponding to each mask requires approximately 1/2(2°) 
s 128 vector additions. Calculating the candidate code words corresponding 
to single errors in the exposed bits also requires 2k s 2(64) s 128 
vector additions, at most. Since using 1000 erasure masks yields the 
same performance as using 50 erasure masks and allowing single errors, 
the second method requires one- tenth the amount of computation as the 
first . 


Extending this comparison to error patterns of greater than 
one error, there are 


64 



= 2080 


single and double error patterns. These can be checked in the time 
required to reduce the parity check matrix 16 times again giving a 
savings of about 10 times. The savings can be even greater if error 
patterns are ordered so that those of higher probability are used first. 
In this case some triple error patterns will precede some double error 
patterns. In this case, to get the performance equivalent to all single 
and double error patterns, only about one-half the number of patterns 
need to be used, giving a savings of 20 times over using erasure masks 
only. 
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SECTION 7 


AN EFFICIENT HYBRID ALGORITHM 


7.1 INTRODUCTION 

The previous chapters have discussed ways to Improve the 
efficiency of previously known decoding algorithms which are based 
upon selecting a small set of candidate code words. By suitably com- 
bining the best features of these algorithms, an algorithm has been 
developed which is more efficient than any of those previously known. 
This algorithm uses a small number of erasure masks, assumes errors 
of low weight in the unerased bits and uses redundancy to reduce the 
number of error patterns that need to be checked. The input to the 
algorithm is a vector whose elements are quantized amplitudes of the 
symbols of the received code word and the output is an estimate of 
the transmitted code word. A flow chart of the major sections of the 
algorithm is shown in Figure 7-1. 

The first step of the algorithm, as in all the ones pre- 
viously discussed, is to sort the symbols of the received code word 
according to their absolute magnitude, permute the columns of the parity 
check matrix, and reduce it to standard form. (Efficient sorting and 
matrix reduction algorithms are described in the Appendix.) At this 
point the algorithm diverges from those previously considered. 


7.2 PARTIAL SYNDROME DECODING 

When the unerased bits are not all independent, there arises 
the possibility that certain values of the bits do not satisfy the 
parity check equations. In previous sections these cases were considered 
to be a decoding failure. However, it is possible to take advantage 
of the dependency in order to determine which error patterns in these 
bits satisfy the parity check equations. Only these patterns need 
to be used to generate candidate code words. The number of such error 
patterns of a given weight can be a small fraction of the total number 
of error patterns of that weight, greatly reducing the number of candi- 
dates required for a given level of performance. 

Calculating the error patterns which will be consistent with 
the parity check equations can best be done by considering a portion 
of the syndrome and determining the error patterns, which, when added 
to the initial estimate of the received vector, will make that portion 
equal to zero. Since a code word must satisfy [H]£. = H, it will also 
satisfy this equation for any subset of rows of H. Reducing the 
parity check matrix and partitioning H as: 


Pi ! 0 


* ^ 
s.^ 

**2 I I 


£2 


= Q. 


r blts- 


-(n-r) bits 
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Figure 7-1. Flow chart of Soft Decision Decoding Algorithm 
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it is seen that [Pi]fi.i * il. Therefore, the first r bits of the code 
word must also be the solution of a set of homogeneous equations. 

For an arbitrary received vector g,, the project [H]a = ^ 
is called the syndrome and specifies the coset containing the possible 
error patterns in g,. The same notion can be used when considering 
only the first r bits of the received word. Then CPiJgi = is the 
partial syndrome which specifies the possible error patterns in g^. 
Representing g^ by ♦ gi» where is the partial error pattern 
corresponding to the partial code word 

[Pi] till + fi-i] = Ai 


or 


[Pl3gl = gi • 

Given [Pi] and gi, there are a large number of partial error patterns 
gl that will satisfy this equation. However, for the decoding algorithm 
to be considered here, it is sufficient to consider patterns of 0, 1 
or 2 errors. Note that even though the adjective "partial" is applied 
to gl , gl , and gl , the remainder of the code word is completely determined 
from g^ by g 2 = [P 2 ]gi> The advantage of this approach is that candidate 
code words are determined only by possible error patterns in the most 
reliable received bits gi. The remaining received bits g 2 do not enter 
at all into the calculation and can be considered erasures. 

No errors as a possible error pattern can only occur if 
the partial syndrome equals zero; single errors can occur in those 
bits whose corresponding columns of [Pi] are Identical to gi, and double 
error patterns in those bits whose corresponding columns of [Pi] sum 
to gl. In general, for an error pattern of weight w to be a possibility, . 
the sum of the w corresponding columns of [Pi] must equal gi. 

To find possible double error patterns, represent the columns 
of [Pi] by vectors Pi, P 2 ,...,Pn 

[Pi] = [Pi P2 P3 Pm]* 

A possible double error in bits 1 and j must satisfy 

Pi + Pj= gl 

where 


i j. 

Considering the vectors Pj^ and gi as binary numbers, construct a table 
of the pairs of numbers satisfying the above equation. Under 

each entry of the table place the index of the columns which have this 
value. The possible double error patterns are then taken from this 
table. As an example, consider the reduced parity check matrix of 
Section 2.3 and the received vector 
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« [00000 00000 10100 01000 001 ]. 


The first three row of H generates the partial syndrome 

001110101101100 1 

111101000111010 &i s 1 
011011011000101 1 

Table 7-1 shows the possible double er«*or patterns. 


Table 7-1. Possible Double Error Patterns 


li 

PJ 

i 

J 

Possible Double Error Patterns 

000 

111 


3 


001 

110 

8,15 

4,10,2 

(8,4) (8,10) (8,2) (15,4) (15,10 (15,2) 

010 

101 

1,11,14 

5,9,3 

(1.5) (1,9) (1,3) (11,5) (11,9) (11,3) 

(14.5) (14,9) (14,3) 

oil 

100 

2,6 

7 

(2,7) (6,7) 


In this example there are only 17 possible double error patterns when 
considering the partial syndromes, as compared to 

CD = 

total pairs of columns. The actual error pattern (11,13) is among 
these . 

The average number of t-error patterns, E|.(m), that can 
occur in a partial received vector of dimension m and that are consistent 
with a given partial syndrome can be estimated using a combinatorial 
argument if it is assumed that, on the average, all possible values 
for the columns of [P,] and also all values of the syndrome are equally 
likely. 


Given the partial parity check matrix [Pi] with r rows and 
m columns, there are 2 ^ possible values for the syndrome and for the 
columns of [P^]. The probability that zero errors in the first m bits 
satisfy the parity check equations is the probability of a zero syndrome, 
or 


Eo(m) 


1 

2P 
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single errors that satisfy [P^] Imply that a column of [P^] equals 

the syndrome. If all syndromes are equally llkely» then the probability 

that a given column equals that value Is 1/2'^'. For a matrix of m oolunns 

m 

E^(m) 8 — . 

2 ** 


Possible double error patterns are those whose corresponding columns 
sum to the s^drome. For a given syndrome there are u s 2^**^ pairs 
of values, a and b, whose sum equals that syndrome. If there are n^ 
columns with value a, and n^ columns with value b, then the contribution 
of these columns to the total number of error patterns Is n^n^. Con- 
sidering k 8 n^ ••• n^ fixed, the average value of the product n^n^ Is 


F2(k) 



The probability that there will be k columns out of m with values a 
or b Is 


U; ^ 

(2u)“ 


There are u equally likely pairs, each with an average value of n^n^ 
equal to p 2 (k). Therefore, the average of the total number of double 
error patterns Is 


E 2 (m) 


u 


E\k)2‘‘ (2u-2)®->« F-(k) 
ksO 2 


(2u)“ 


Simplifying this expression. 


u 




u® 
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u® 



,m-2 


im 



EgC®) 



In general, 


Et(ni) 



The number of possible t-error patterns is reduced, on the average, by 
a factor of 2^ given a redundancy of r bits. 

Considering only error patterns less than a given weight, 
increasing the number of redundant bits decreases the number of error 
patterns that need to be checked but Increases the overall probability 
of error since fewer bits are erased. To Improve the performance using 
this technique, one can either Increase the maximum weight of the error 
patterns that are checked, or one can use the erasure masks and error 
patterns scheme of Section 6. 

Returning to the (128, 6M) BCH code as an example, the prob- 
ability of an error of weight greater than t, Oiti^, as a function 
of the number of bits not erased for a signal-to-nolse ratio of 2.0 dB 
is given in Figure 7-2. At this signal-to-noise ratio, in order to 
closely approach maximum likelihood performance, one must at least 
test all triple error patterns and some four error patterns if the 
redundancy is greater than 5 bits. With such a large number of error 


7-6 




Figure 7-2. Performance as a Function of Number of 
Redundant Bits and Number of Masks. 


patterns, it is more efficient to use sets of erasure masks and test 
only for double error patterns. 


7.3 GENERATING AND TESTING CANDIDATE CODE WORDS 

For each of the possible error patterns a code word is generated 
and correlated with the received vector. The form of correlation which 
is most suitable is to minimize 


i 

where £ » (e^j .e?, . . . ,ep)^ is the error vector and L< is the amplitude 
of the ith received biz. The errors in the unerased bits are those 
calculated in the previous step, while those in the erased bits can 
be found by a small number of vector additions, at most equal to the 
weight of the error in the unerased bits. Given the error vector, the 
correlation is calculated by summing those amplitudes for which ej^ ^ 0. 
On the average, this will be about half the number of erased bits in 
the summation. 
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7.4 AN ESTIMATE OF THE COMPLEXITY OP THE ALGORITHM 

As each symbol Is detected » it is linked into the sorting 
table. Nothing more can be done until the entire word Is received, 
at which time the amplitudes are sorted. This requires only one pass 
through the list which contains 126 ♦ 64 s 192 items assuming the 
amplitude is quantized to six bits magnitude. 

The remainder of the algorithm, as seen in Figure 7-1 is 
a single loop containing almost all of the required computation. Each 
pass through the loop is independent and can be done in any order or 
in parallel. Considering only a single pass, the first operation is 
to permute the columns of the parity check matrix and reduce the permuted 
matrix to standard form. This is most easily done by copying the columns, 
in permuted order, into the area of memory that will be accessed by 
the reduction algorithm. 

The unpermuted H matrix is stored in reduced form so that 
half the columns contain only a single one. After the initial permutation, 
on the average, half of these columns correspond to erased bits and 
can be ignored in the reduction. The number of checks that must be 
made to see whether there is a zero or one in a particular bit position 
in the pivotal row is 



X 496. 


If half of these are ones, 246 vector additions are required. 

The calculation of the correlation requires approximately 
30 arithmetic additions per candidate code word. Because of the large 
number of operations required, a real time decoder using this algorithm 
should have a special arithmetic unit performing the summations. Ideally, 
the correlator should work as fast as the candidate code word generator 
so that each code word may be checked as it is generated. It may be 
possible to perform an approximate calculation, with negligible loss 
of performance, directly from the error vector based upon its weight 
and the position of the errors but this has not yet been investigated. 

The redundancy corresponding to the minimum amount of computa« 
tion requires a number of assumptions. First, a received signal-to»noise 
ratio of 2.0 dB is still assumed. At this signal-to-noise ratio the 
bit probability of error of a maximum likelihood decoder is 10'^, which 
is close to the tolerable limit for most applications. For higher 
signal-to-nolse ratios, the performance of decoding algorithms of this 
type with a fixed set of parameters generally gets better, that is, 
approaches the maximum likelihood performance more closely. Second, 
the level of performance that will be assumed for this signal-to-noise 
ratio is a probability of 0.9 x 10~3 that an error pattern will not 
be covered or corrected. With these assumptions the minimum occurs for 
a redundancy of 6. At this redundancy 20 erasure masks are required 
to achieve the desired performance. 
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The two oiajor unite of the decoder ere the matrix reducer 
and the candidate code word generator and correlator. Using the param- 
eters mentioned previously, 20 matrix reductions are performed per 
received code word and approximately 40 candidate code words are gen- 
erated per matrix reduction. If the matrix reductions are done serially, 
the time required for these two operations should be the same so that 
each unit will not have to wait for the other to complete its computation. 

Assuming the speed of the algorithm is fixed by the time 
required for the matrix reduction, the time required to decode a single 
code word is approximately the time required to reduce the matrix 20 
times. A conservative estimate is that the initial reduction can be 
done in 1000 steps and subsequent reductions in 200 steps, or 5000 
steps overall. At a computation rate of 100 nanoseconds per step, 
one word can be decoded in 0.5 milliseconds. There are 64 information 
bits per code word so that at this computation rate a data rate of 
126 kbits/second can be handled. Higher data rates will require a 
faster rate of computation or more parallelism while lower data rates 
can shift some of the computational overhead and bookkeeping from special 
purpose to general purpose hardware. 
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SECTION 6 


SUMMARY AND CONCLUSIONS 


Approximate maximum likelihood decoding algorithms based 
upon selecting a small set of oandidate code words to be correlated 
with the received vector i oan give a close to optimum performance with 
a reasonable amount of computation. This report describes the search 
for computationally efficient algorithms of this type. Emphasis has 
been placed upon the (128,6A) BCH code of minimum distance 22 since 
it is one of the shortest codes whose maximum likelihood performance 
at low signal-to-noise ratios is better, by more than 1 dB, than the 
rate 1/2, encoding constraint length 7 convolutional code in wide use 
today. 


The most efficient algorithm found for decoding this code 
is one which sorts the received bits according to their estimated 
probability of error and then selects candidate code words by: 

(1) using a small number of erasure masks, 

(2) assuming errors of weight two or less in the unerased 
bits, 

(3) using six bits of redundancy to reduce the number 
of error patterns that need be checked, and 

(4) using the computationally most efficient algorithm 
at each step of the decoding. 

This algorithm is competitive with the Viterbl decoding 
of the (7,1/2) convolutional code in number of computations though 
not In simplicity of the program. 

The principle found most useful In developing the algorithm 
is to take as much advantage as possible of the sorting of the received 
bits according to their estimated probability of error. This is best 
illustrated by the increase of efficiency when using weighted masks 
rather than combinatorlally generated ones which consider all erased 
bits equally. In addition, maximum utilization should be made of the 
structure of the code. A possible explanation of the relative efficiency 
of this decoding algorithm is, that in addition to the use of linearity 
of the code to calculate the erased bits, it is also used to correct 
errors in the unerased ones. E. Berlekamp's soft decision decoding 
algorithm (Reference 1-7) also takes advantage of the cyclic and alge- 
braic structure of the code. If a way can be found to use these properties 
for decoding random errors, it is highly probable that such a scheme will 
be even more efficient for a given level of performance than those 
described in this report. 

In the algorithms of Sections 5, 6 and 7, sets of weighted 
erasure masks were used. These sets were first defined by L. Baumert 
and R. McEliece (Reference 1-4) and were constructed using a random 
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number generator. For seta with a large number of masks this is the 
easiest method of constructing them and| as Figure 5-6 shows, it is 
probably close to the best. However, for sets with a small number 
of masks, as those of Section 7» a more deterministic method may be 
superior. Constructing such sets, which have given covering and dis- 
tance properties, is an interesting combinatorial problem and should 
be pursued further. 

The advantage of the algorithms described in this report 
over that of the Viterbi decoding of convolutional codes is due mainly 
to the ability to decode codes of higher Q for a given computational 
complexity. It is very likely that an exponential increase of complex- 
ity will be necessary if an attempt is made to decode larger codes 
of higher Q Just as is now the case with Viterbi decoding. The algorithms 
discussed here are useful for codes up to the length for which the 
exponential increase in complexity begins. It would be worthwhile 
to find the length for which this occurs, for it is at this point that 
these algorithms operate at their best. 
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APPENDIX 


EFFICIENT ALGORITHMS FOR SORTING AND MATRIX REDUCTION 


A.1 INTRODUCTION 

The sorting of real numbers can be done efficiently by 
many algorithms all requiring, on the average, n log n operations to 
sort n bits (Reference A-1). The sorting can be done even more 
efficiently with neglible loss of performance by quantizing the ampli- 
tudes to a fairly large number of levels. Representative of algorithms 
which sort into a finite number of bins is the linksort, which requires 
on the order of L -f N (L s number of bins, N = number of items to be 
sorted) operations. 

Reducing the parity check matrix to standard form can best 
be done by Gaussian reduction. In this case the matrix elements can 
only have the values 0 or 1 and the computation is best done as vector 
exclusive or addition. In the general case the reduction requires 
on the order of n^/2 such operations, where n = the rank of the matrix. 
As seen in Section 5.3, the matrix can be arranged beforehand in such 
a way so that the number of operations is considerably less than this. 


A. 2 SORTING THE RECEIVED SYMBOLS ACCORDING TO THEIR 

RELIABILITY 

Common to all the algorithms analyzed here is the sorting 
of the received symbols according to their reliability. It is assumed 
that the received signal has been demodulated and reduced to a form 
in which each symbol is represented by a real number of the form 
yi = Si + n where Sj = ±1 and n is an independent sample of a zero 
mean Gaussian process with variance 


No _l_ 

ii^ ' 2^ 


This can be shown to be a sufficient statistic; that is, it represents 
the received signal without loss of information. 

A good measure of the reliability of each bit is the absolute 
magnitude of its log likelihood ratio 


Li = 


Pr(yilx£s+1 ) 

<n - . 

Pr(yilx^=-1 ) 
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This measure is intuitively satisfying and, in addition, is required 
for estimating the transmitted code word. For the memoryless Gaussian 
channel 


= tn 



(yi-l)2‘ 
2 ^ . 


(yi+i)2 


2<t‘ 




exp 



Vi 


so that the absolute magnitude of the received symbol is a measure 
of the reliability of that symbol auid the required sorting can be done 
on the received vector directly. 

An efficient algorithm for sorting amplitudes into a small 
number of bins is the linksort. The version described here sorts integers 
according to absolute value as required for ordering the received symbols 
according to their relative reliability. The algorithm requires only 
a single pass through the data to construct a table and a single pass 
through the table to output the integers in sorted order along with 
their input index. The number of memory locations required is the 
number of items to be sorted plus the absolute value of the largest 
integer on the data list. 

A flow graph of the algorithm is shown in Figure A-1. 

The variables used in the algorithm are: 

n s number of items to be sorted 

ffl = absolute value of largest integer in data list 
= list of integers to be sorted, 1 i £ n 

Aj^ = output list of integers sorted according to absolute 
value 

= output list of indices associated with each 

Sj = special links used internally in algorithm, 0 J m 

The algorithm can be divided into three stages: 

(1) Initialize the chain. The are the special links 

in the chain which point to data of absolute amplitude 
1. At the beginning of the algorithm each special 
link points to the next special link in the chain. 

The value of z can be any number larger than n so 
as to be able to distlnqulsh between pointers to 
special links and pointers to data links. If z is 
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Figure A-1. Flow Chart of the Linksort Algorithm 
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Figure A-2. Schematic Representation of the Links in 
the Linksort Algorithm 


the first power of 2 greater than n, then special 
links will be characterized by a 1 in the highest 
order bit. This can be thought of as a special sign 
bit. 

(2) Link data into the chain . The ith data symbol has 
absolute magnitude J. The contents of memory location 
Sj are stored in Lj^ and the index 1, along with the 
sign of the ith data symbol, is stored in Sj. The 
memory locations can either contain a special pointer, 
indicated by the z bit pointing to the next amplitude, 
or an index i, depending on whether or not the amplitude 
J has been encountered before. These two cases can 

be schematically represented as in Figure A-2. 

(3) CbnstruQt thg wtPMt MBlttvidg and Indgx dhalna. The 
chain is read out starting at location Sq. Pointers 
with the z bit equal to zero correspond to data of 
Eunplitude J and position index 1. Those with the z 

bit equal to one correspond to a special link indicating 
an amplitude increase of one. The algorithm is completed 
when the last link in the chain is detected. 

In order to demonstrate this algorithm, a step-by-step 
development of the arrays {L} and {S} is shown in Figure A-3. The 
input list contains 12 numbers with maximum amplitude 6. The value 
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Figure A-3- Development of the Chain in the Linksort Algorithm 


of 100 is used for z rather than a power of 2 since, in an example, 
decimal numbers are visually more convenient. 

In an actual decoding algorithm many more amplitude levels 
would be used. The number of memory locations required for the arrays 
are 3n -f q (where n s the number of symbols to be sorted and q s the 
number of amplitude levels). The amount of computation required in 
building the linked list is independent of q, while reading out the 
data requires n <«- q steps. For a code of block length 128, q equal 
to 6U or 128, corresponding to 6 or 7 bits plus a sign bit, is reason- 
able. The advantages of fine quantization are that the decoder is 
much less sensitive to changes in noise and signal power and the 
quantization loss, which is on the order of 0.2 to 0.25 dB for 3 bit 
quantization usually used in soft decision decoders, is negligible. 


A. 3 REDUCING THE PARITY CHECK MATRIX TO STANDARD FORM 

The decoding algorithms which assume erasures calculate 
the erased bits from a linear combination of the unerased ones. By 
permuting the columns of the parity check matrix which correspond to 
the erased bits to the right, the matrix may then be partitioned into 
two parts, a k X (n - k) partition corresponding to the known bits 
[Pi] and a(n-k)x(n-k) partition corresponding to the unknown 
ones [P 2 I. 


The entire parity check matrix is then reduced by row oper- 
ations until [P 2 ] is in triangular form. The parity check equations 
require 


[Hla s n or 


[PllP2] 


r^i 


= Q. . 




After reduction this becomes 


1 1 


a1 

1 1 



Pi 1 1 


— 

I 1 

1 


A2 

I 


_ ^ 


Considered as a set of simultaneous linear equations with all the bits 
of known, one can, starting with the first bit of A 2 * calculate 
the values of the erased bits from those of a.i and the previously calcu- 
lated ones of 0 - 2 • procedure is straightforward unless there is 

a zero element along the diagonal of [P 2 ]. This Indicates that the 
bit corresponding to the column containing the zero is Independent 
of the bits in g.. and that there exists a dependency relationship among 
the bits of so that not all values are permissible. 
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In order to handle this case simply, It is worthwhile 
to reduce the matrix further. By row operations all the off-diagonal 
terms of the columns of [P 2 I whioh contain a one along the diagonal 
can be reduced to zero, and all the rows of [P 2 ] tdiioh contain a zero 
along the diagonal can be reduced completely to zero. The form of 
the H matrix is now 


0 L - all zeros in this row 


1 possible ones in this column off 

the diagonal 


The matrix can be simplified by permuting the zero row to the top, 
and the zero column, along with its corresponding bit in 
left. The parity check equations are then 



CH]a = 1 ; 



This corresponds to three sets of equations: 

(1) [Pialal = il 

(2) fl.2a arbitrary 

(3) A2b = + [P3]fl2a 

The first equation expresses the dependency relationships 
among the unerased bits which must be satisfied if a solution is to 
exist. If the bits of satisfy these equations, then the number of 
possible code words with these unerased bits is 2^ (r = the number 
of bits in all of which can be used for candidate code words. 

The reduction of the parity check matrix requires a significant 
portion of the total computation time of the algorithm so that it is 
worthwhile to seek efficient techniques to perform this calculation. 

First, permuting the rows and columns need not actually be done. It 
is sufficient to keep a table of the proper order of the row and column 
indices. Second, most of the operations in reducing the matrix and 
generating candidate code words are performed most efficiently as vector 
additions. At various stages of the algorithm both rows and columns 
are treated as vectors, and converting from one to the other in the 
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computer ia a time conauolng operation. This is not necessary, however, 
since row operations may be performed on data stored as columns. The 
flowchart of a matrix reduction algorithm which represents each column 
of the H matrix as a binary vector stored in a single word of memory 
is shown in Figure A-4. The algorithm reduces this matrix by row oper- 
ations without converting from column vector to row vector form. 

With this algorithm, the time required to arrange the data 
for calculation is a minimum and the overall decoding time can be estimated 
from the number of vector additions that have to be performed. 
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Figure A-4. Matrix Reduction Algorithm 
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